Chapter 11

Randomness and Complexity

Randomness is a concept deeply entangled with bioinformatics. A random sequence

cannot convey information, in the sense that it could be generated by a recipient

merely by tossing a coin. Randomness is therefore a kind of “null hypothesis”; a

random sequence of symbols is a sequence lacking all constraints limiting the variety

of choice of successive symbols selected from a pool with constant composition (i.e.,

an ergodic source). Such a sequence has maximum entropy in the Shannon sense;

that is, it has minimum redundancy.

If we are using such an ideally random sequence as a starting point for assess-

ing departures from randomness, it is important to be able to recognize this ideal

randomness. How easy is this task? Consider the following three sequences:

1111111111111111111111111111111111

0101010101010101010101010101010101

1001010001010010101011110100101010

each of which could have been generated by tossing a coin. According to the results

from Chaps. 8 and 9, all three outcomes, indeed any sequence of 32 1s and 0s, have

equal probability of occurrence, namely 1 divided by 2 Superscript 321/232. Why do the first two not “look”

random? Kolmogorov supposed that the answer might belong to psychology; Borel

even asserted that the human mind is unable to simulate randomness (presumably the

ability to recognize patterns was—and is—important for our survival). Yet, apparent

pattern is also present in random sequences: van der Waerden has proved that in every

infinite binary sequence at least one of the two symbols must occur in arithmetical

progressions of every length. Hence, the first of the above three sequences would be

an unexceptionable occurrence in a much longer random sequence—in fact, whether

a given sequence is random is formally undecidable. At best, then, we can hope for

heuristic clues to the possible absence of randomness and, hence, presumably the

presence of meaning, in a gene sequence.

© Springer Nature Switzerland AG 2023

J. Ramsden, Bioinformatics, Computational Biology,

https://doi.org/10.1007/978-3-030-45607-8_11

121